ICICLES: Self-Tuning Samples for Approximate Query Answering
نویسندگان
چکیده
Approximate query answering systems provide very fast alternatives to OLAP systems when applications are tolerant to small errors in query answers. Current sampling-based approaches to approximately answer aggregate queries over foreign key joins suffer from the following drawback. All tuples in relations are deemed equally important for answering queries even though, in reality, OLAP queries exhibit locality in their data access. Consequently, they may waste precious real estate by sampling tuples that are not required at all or required very rarely. In this paper, we introduce icicles, a new class of samples that tune themselves to a dynamic workload. Intuitively, the probability of a tuple being present in an icicle is proportional to its importance for answering queries in the workload. Therefore, an icicle consists of more tuples from a subset of the relation that is required to answer more queries in the workload. Consequently, the accuracy of approximate answers obtained by using icicles is better than a static uniform random sample. We show, analytically, that for a certain class of queries reflected by the workload, icicles yield more accurate answers. In a detailed experimental study, we examine the validity and performance of icicles. Supported by a Microsoft Graduate Fellowship. Permission to copy without fee all or part of this material is granted provided that the copies are not made or distributed for direct commercial advantage, the VLDB copyright notice and the title of the publication and its date appear, and notice is given that copying is by permission of the Very Large Data Base Endowment. To copy otherwise, or to republish, requires a fee and/or special permission from the Endowment. Proceedings of the 26th VLDB Conference, Cairo, Egypt, 2000.
منابع مشابه
Data Sketch/Synopsis
1. Aggarwal C.C. On biased reservoir sampling in the presence of stream evolution. In Proc. 32nd Int. Conf. on Very Large Data Bases, 2006. 2. Chaudhuri S. et al. Overcoming limitations of sampling for aggregation queries. In Proc. 17th Int. Conf. on Data Engineering, 2001. 3. Ganti V., Lee M.-L., and Ramakrishnan R. ICICLES: Self-tuning samples for approximate query answering. In Proc. 28th In...
متن کاملApproximate Query Processing: Taming the TeraBytes
2 Garofalakis & Gibbons, VLDB 2001 # Outline • Intro & Approximate Query Answering Overview – Synopses, System architecture, Commercial offerings • One-Dimensional Synopses – Histograms, Samples, Wavelets • Multi-Dimensional Synopses and Joins – Multi-D Histograms, Join synopses, Wavelets • Set-Valued Queries – Using Histograms, Samples, Wavelets • Advanced Techniques & Future Directions – Stre...
متن کاملKnowledge Driven Query Sharding
We present the idea of an approach to database query sharding that makes use of knowledge about data structure and purpose. It is based on a case study for a database system that contains information about documents. By making use of knowledge about the data structure and the specific top-k queries to be processed we demonstrate a method for avoiding costly and unnecessary steps in query answer...
متن کاملExperiments on the morphology of icicles.
Icicles form when cool water drips from an overhanging support under ambient conditions which are below freezing. Ice growth is controlled by the removal of latent heat, which is transferred into the surrounding air via a thin film of water flowing over the ice surface. We describe laboratory experiments in which icicles were grown under controlled conditions. We used image analysis to probe th...
متن کاملCooperative Query Answering for Approximate Answers with Nearness Measure in Hierarchical Structure Information Systems
COOPERATIVE QUERY ANSWERING FOR APPROXIMATE ANSWERS WITH NEARNESS MEASURE IN HIERARCHICAL STRUCTURE INFORMATION SYSTEMS Thanit Puthpongsiriporn, Ph.D. University of Pittsburgh Cooperative query answering for approximate answers has been utilized in various problem domains. Many challenges in manufacturing information retrieval, such as: classifying parts into families in group technology implem...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 2000